Implement a new api that will be switching between asm and hip pa internally by JohnNikolay84 · Pull Request #1821 · ROCm/aiter

JohnNikolay84 · 2026-01-12T16:26:22Z

Inference engines should be calling paged_attention_common now with shuffled kv cache layout and aiter internally will decide between asm or hip kernel. HIP is more performant for lower concurrencies ( < 128). Also test_pa.py unit test has been updated to include the new interface.

Since asm and hip kernels require scales in different layouts the client of this api will be expected to provide both ( a scale value for hip per k/v tensor and an expanded per block scale for asm ).

Motivation

vLLM has had a binary choice between HIP ( c++ ) and asm kernels provided by aiter framework. HIP kernel is more performant on lower concurrencies, asm kernel is more performant on higher concurrencies. It would be great if Aiter could pick the best depending on the input.

Technical Details

HIP pa kernel under csrc/cpp_itfs will now support a variant when kv_cache is provided as 5D tensor (asm style)
Aiter internally will switch between HIP and asm based on expected concurrency.
Inference engines should be calling new API with asm shuffled kv cache layouts - paged_attention_common, but also provide a scratch space in case a 2-stage HIP kernel will be selected.
No legacy api is affected.

Test Plan

Run qwen3 model and make sure accuracy has not regressed and performance is improved
Create a new aiter unit test for the new api and make sure it pass.

Test Result

25% performance improvement for CONC > 64 on qwen3
8% overall improvement for Qwen_Qwen3-235B-A22B in geomean-OTPS

Inference engines should be calling paged_attention_common now with shuffled kv cache layout and aiter internally will decide between asm or hip kernel. HIP is more performant for lower concurrencies ( < 128). Also a unit test has been updated to include the new interface. Note that support for the shuffled scales in HIP is not supported and is always redirected to asm now when KV cache is in int8 or fp8 formats.

op_tests/README_pa_merged_tests.md

op_tests/test_pa_merged.py

csrc/cpp_itfs/pa/pa_v1.cuh

fsx950223 · 2026-01-15T04:06:57Z

@copilot Is the PR pass the page attention unit test?

Copilot · 2026-01-15T04:07:07Z

@fsx950223 I've opened a new pull request, #1848, to work on those changes. Once the pull request is ready, I'll request review from you.

…ernally (#1821) * Implement a new api that will be switching between asm and hip pa Inference engines should be calling paged_attention_common now with shuffled kv cache layout and aiter internally will decide between asm or hip kernel. HIP is more performant for lower concurrencies ( < 128). Also a unit test has been updated to include the new interface. Note that support for the shuffled scales in HIP is not supported and is always redirected to asm now when KV cache is in int8 or fp8 formats. * Delete op_tests/README_pa_merged_tests.md * Delete op_tests/test_pa_merged.py * Fix formatting according to Black requirements * Fix one last place with broken formatting * Remove modification to pa_v1, we already have pa for 5D kv cache * Fix another formatting issue * Add proper quant support for the common API * Apply formatting * Remove redundant parameters * Remove redundant parameters --------- Co-authored-by: Sergey Solo <ssolovye@amd.com> Co-authored-by: Mikko Tukiainen <mikko.tukiainen@amd.com>

JohnNikolay84 requested review from a team and valarLip January 12, 2026 16:26

JohnNikolay84 commented Jan 12, 2026

View reviewed changes

op_tests/README_pa_merged_tests.md Outdated Show resolved Hide resolved

JohnNikolay84 commented Jan 12, 2026

View reviewed changes

op_tests/test_pa_merged.py Outdated Show resolved Hide resolved

JohnNikolay84 added 4 commits January 12, 2026 20:09

Delete op_tests/README_pa_merged_tests.md

04d6687

Delete op_tests/test_pa_merged.py

a744e4a

Fix formatting according to Black requirements

476286e

Fix one last place with broken formatting

b95df2e

JohnNikolay84 requested review from fsx950223 and removed request for fsx950223 January 12, 2026 22:24

fsx950223 reviewed Jan 13, 2026

View reviewed changes

csrc/cpp_itfs/pa/pa_v1.cuh Outdated Show resolved Hide resolved

JohnNikolay84 added 2 commits January 13, 2026 10:49

Remove modification to pa_v1, we already have pa for 5D kv cache

85d9984

Fix another formatting issue

d759438

JohnNikolay84 requested a review from fsx950223 January 13, 2026 11:27

JohnNikolay84 and others added 4 commits January 13, 2026 15:42

Add proper quant support for the common API

fa7634d

Apply formatting

39e3d66

Remove redundant parameters

3677df8

Remove redundant parameters

94816c7

Copilot AI mentioned this pull request Jan 15, 2026

Address PR review comments: confirm unit test coverage #1848

Closed

fsx950223 approved these changes Jan 15, 2026

View reviewed changes

Merge branch 'main' into common_hip_asm_pa_inerface

52a3aac

valarLip approved these changes Jan 16, 2026

View reviewed changes

valarLip merged commit 22d1e35 into main Jan 16, 2026
17 checks passed

valarLip deleted the common_hip_asm_pa_inerface branch January 16, 2026 02:53

samutamm mentioned this pull request Jan 22, 2026

[ROCm][perf] Shuffle KV cache to use paged_attention_common samutamm/vllm#1

Closed

5 tasks

JohnNikolay84 self-assigned this Jan 22, 2026

samutamm mentioned this pull request Jan 23, 2026

[ROCm][perf] Shuffle KV cache to use paged_attention_common vllm-project/vllm#32914

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a new api that will be switching between asm and hip pa internally#1821

Implement a new api that will be switching between asm and hip pa internally#1821
valarLip merged 12 commits intomainfrom
common_hip_asm_pa_inerface

JohnNikolay84 commented Jan 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fsx950223 commented Jan 15, 2026

Uh oh!

Copilot AI commented Jan 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

JohnNikolay84 commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fsx950223 commented Jan 15, 2026

Uh oh!

Copilot AI commented Jan 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

JohnNikolay84 commented Jan 12, 2026 •

edited

Loading